Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44718][SQL] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value #42394

Closed

Conversation

majdyz
Copy link
Contributor

@majdyz majdyz commented Aug 8, 2023

What changes were proposed in this pull request?

Set the column vector default memory mode to depend on the off-heap memory mode flag. This is to prevent a user from using Vectorized-Reader with an on-heap column-vector by default when the off-heap memory mode is enabled on the cluster.

Why are the changes needed?

Avoid the unintentional usage of on-heap memory in vectorized-reader when off-heap memory mode is enabled by the user.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual & existing tests.

@github-actions github-actions bot added the SQL label Aug 8, 2023
@majdyz majdyz changed the title Match ColumnVector memory-mode config default to OffHeapMemoryMode config value [SPARK-44718] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value Aug 8, 2023
@HyukjinKwon HyukjinKwon changed the title [SPARK-44718] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value [SPARK-44718][SQL] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value Aug 9, 2023
@majdyz majdyz requested a review from HyukjinKwon August 9, 2023 08:24
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@majdyz
Copy link
Contributor Author

majdyz commented Aug 14, 2023

The failing CIs seems to be unrelated test failures.

import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.test.SharedSparkSession

class ConfigColumnVectorModeDefaultSuite extends SharedSparkSession {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably too much to add a new test suite for it. We have tests for the config framework and .fallbackConf is already tested by ConfigEntrySuite. Shall we just remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed :)

@majdyz majdyz requested a review from cloud-fan August 15, 2023 04:07
@cloud-fan
Copy link
Contributor

The failure is unrelated, I'm merging it to master, thanks!

@cloud-fan cloud-fan closed this in afcccb4 Aug 15, 2023
valentinp17 pushed a commit to valentinp17/spark that referenced this pull request Aug 24, 2023
…ffHeapMemoryMode config value

### What changes were proposed in this pull request?

Set the column vector default memory mode to depend on the off-heap memory mode flag. This is to prevent a user from using Vectorized-Reader with an on-heap column-vector by default when the off-heap memory mode is enabled on the cluster.

### Why are the changes needed?
Avoid the unintentional usage of on-heap memory in vectorized-reader when off-heap memory mode is enabled by the user.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual & existing tests.

Closes apache#42394 from majdyz/offheap-colvec-mode-default-value.

Lead-authored-by: Zamil Majdy <[email protected]>
Co-authored-by: Zamil Majdy <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
…ffHeapMemoryMode config value

### What changes were proposed in this pull request?

Set the column vector default memory mode to depend on the off-heap memory mode flag. This is to prevent a user from using Vectorized-Reader with an on-heap column-vector by default when the off-heap memory mode is enabled on the cluster.

### Why are the changes needed?
Avoid the unintentional usage of on-heap memory in vectorized-reader when off-heap memory mode is enabled by the user.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual & existing tests.

Closes apache#42394 from majdyz/offheap-colvec-mode-default-value.

Lead-authored-by: Zamil Majdy <[email protected]>
Co-authored-by: Zamil Majdy <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Mrhs121 pushed a commit to Mrhs121/spark that referenced this pull request Apr 17, 2024
…ffHeapMemoryMode config value

### What changes were proposed in this pull request?

Set the column vector default memory mode to depend on the off-heap memory mode flag. This is to prevent a user from using Vectorized-Reader with an on-heap column-vector by default when the off-heap memory mode is enabled on the cluster.

### Why are the changes needed?
Avoid the unintentional usage of on-heap memory in vectorized-reader when off-heap memory mode is enabled by the user.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual & existing tests.

Closes apache#42394 from majdyz/offheap-colvec-mode-default-value.

Lead-authored-by: Zamil Majdy <[email protected]>
Co-authored-by: Zamil Majdy <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit afcccb4)
yaooqinn added a commit that referenced this pull request Jul 2, 2024
…lumnVector.offheap.enabled's doc field

### What changes were proposed in this pull request?

Followup of #42394
```
   * - spark.sql.columnVector.offheap.enabled
     - When true, use OffHeapColumnVector in ColumnarBatch. Defaults to ConfigEntry(key=spark.memory.offHeap.enabled, defaultValue=false, doc=If true, Spark will attempt to use off-heap memory for certain operations. If off-heap memory use is enabled, then spark.memory.offHeap.size must be positive., public=true, version=1.6.0).
     - <value of spark.memory.offHeap.enabled>
     - 2.3.0
```

The doc field shall be interpolated by MEMORY_OFFHEAP_ENABLED.key instead of MEMORY_OFFHEAP_ENABLED. In this PR, we remove the doc redundant doc as it's also can be found in the `MEMORY_OFFHEAP_ENABLED.defaultValueString`
### Why are the changes needed?

docfix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

manually debugging

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #47165 from yaooqinn/minor2.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…lumnVector.offheap.enabled's doc field

### What changes were proposed in this pull request?

Followup of apache#42394
```
   * - spark.sql.columnVector.offheap.enabled
     - When true, use OffHeapColumnVector in ColumnarBatch. Defaults to ConfigEntry(key=spark.memory.offHeap.enabled, defaultValue=false, doc=If true, Spark will attempt to use off-heap memory for certain operations. If off-heap memory use is enabled, then spark.memory.offHeap.size must be positive., public=true, version=1.6.0).
     - <value of spark.memory.offHeap.enabled>
     - 2.3.0
```

The doc field shall be interpolated by MEMORY_OFFHEAP_ENABLED.key instead of MEMORY_OFFHEAP_ENABLED. In this PR, we remove the doc redundant doc as it's also can be found in the `MEMORY_OFFHEAP_ENABLED.defaultValueString`
### Why are the changes needed?

docfix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

manually debugging

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#47165 from yaooqinn/minor2.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants